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Abstract —This letter investigates the problem of distributed 
spectrum access for cognitive small cell networks. Compared 
with existing work, two inherent features are considered: i) the 
transmission of a cognitive small cell base station only interferes 
with its neighbors due to the low power, i.e., the interference is 
local, and ii) the channel state is time-varying due to fading. 
We formulate the problem as a robust graphical game, and 
prove that it is an ordinal potential game which has at least one 
pure strategy Nash equilibrium (NE). Also, the lower throughput 
bound of NE solutions is analytically obtained. To cope with 
the dynamic and incomplete information constraints, we propose 
a distribute spectrum access algorithm to converge to some 
stable results. Simulation results validate the effectiveness of the 
proposed game-theoretic distributed learning solution in time- 
varying spectrum environment. 

Index Terms —cognitive small cell networks, graphical game, 
ordinal potential game, stochastic learning automata. 

I. Introduction 

S MALL cells have been regarded as a promising approach 
to meet the increasing traffic demand for 5G networks. 
Compared with traditional cellular cells, small cells are char¬ 
acterized by low-cost, low-power and short-range, and hence 
increase the spectrum spatial reuse significantly. Due to the 
dynamic and random deployment, centralized optimization 
approaches are not feasible; instead, distributed and self¬ 
organizing optimization approaches are preferable |T|. In this 
context, it is now realized that enabling cognitive ability |2) 
into small cells, which is refereed to as cognitive small cells 
(cscs) a, a, would further improve the resource utilization. 
CSCs are able to sense and observe, learn from the past infor¬ 
mation, make intelligent decisions, and adjust their operational 
parameters such as access channels and transmitting power. In 
this letter, we focus on the problem of distributed spectrum 
access, which is vital for cognitive small cell networks 0. 

To address the distributed and autonomous decision-making, 
game theory 0 is a powerful tool and has been applied 
in small cell networks, e.g., utility-based learning approach 
|6l , reinforcement-learning based self-organizing scheme 0, 
coalitional game based scheme {8j, evolutionary game based 
scheme a and hierarchical dynamic game approach m. 
However, there are two limitations: i) the channel states are 
assumed to be static, which is not always true since it is 
actually time-varying in practical wireless environment, and 
ii) it is assumed that the transmission of a small cell interferes 
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with all other cells, i.e., the interference is global, which is 
also not always true since a small cell actually only interferes 
with its neighbors due to the low transmitting power, i.e., the 
interference is local. Thus, it is timely and urgent to develop 
efficient distributed spectrum access approaches for cognitive 
small cell networks with local interference and time-varying 
channel states. 

Due to the structure of local interaction, it is not easy to 
incorporate game models in wireless networks with local in¬ 
teractions/interference. To overcome this challenge, some pre¬ 
liminary progresses using graphical game have been achieved 
for resource optimization in static wireless networks 02- 
m. Specifically, the local altruistic game that can achieve 
global optimization via neighbor cooperation for opportunistic 
spectrum access was proposed in our previous work m, 
the proprieties of atomic congestion games on graphs were 
analyzed in M, and graphical game models for MAC- 
layer interference minimization were reported in (l5l . and a 
graphical game model for dynamic spectrum sharing in TV 
white space was studied in lfl6l . In methodology, these models 
and solutions, which were originally designed for statistic 
scenarios, can not be applied for the dynamic small cell 
networks with time-varying channel states, which is interesting 
and challenging. 

In this letter, we formulate the problem of distributed 
spectrum access in cognitive small cell networks as a robust 
graphical game. It is proved that the formulated game is an 
ordinal potential game which has at leat on pure strategy Nash 
equilibrium (NE); in addition, the lower throughput bound of 
any pure strategy NE is analytically derived. To cope with the 
dynamic and incomplete information constraints, we propose a 
stochastic learning automata based distributed spectrum access 
algorithm, and prove its convergence towards pure strategy 
NE. Simulation results show that the proposed game-theoretic 
distributed solution achieves high throughput in time-varying 
environment. 

The rest of this letter is organized as follows. In Section II, 
the system model and problem formulation are presented. In 
Section III, we formulate the robust graphical game, analyze 
its properties in terms of the existence and throughput lower 
bound of NE, and propose a distributed learning algorithm 
to achieve desirable results. Finally, simulation results and 
discussion are presented in Section IV, and conclusion is drawn 
in Section V. 
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II. System Model and Problem Formulation 
A. System model 

We consider a network consisting of TV cognitive small 
cell base stations (SBSs) and M channels. Each cognitive 
SBS chooses the operational channel and the serving user 
equipments sharing the channel using some multiple access 
mechanisms. Thus, the task of distributed spectrum access is 
performed by the SBSs. For presentation, we will use SBS 
and user interchangeably in the rest of this letter. Denote the 
user set as TV, i.e., A f = {1,..., TV}, and the channel set as 
M , i.e., M = 

Due to the dynamic and random deployment, it is not 
feasible to use traditional centralized optimization approaches 
to coordinate spectrum access for cognitive small cells. By 
employing the great advances in cognitive radio technique, 
sensing-based distributed spectrum access is desirable. A cog¬ 
nitive SBS will access a channel if the received interference 
from other SBSs is below a threshold 0. With such a 
spectrum access model, the transmission of a cognitive SBS 
only directly affects its neighbors due to the low transmitting 
power. Specifically, if the distance dij between user i and 
j is lower than a threshold do, they interfere with each 
other when transmitting on the same channel. Therefore, the 
potential interference relationship can be characterized by an 
interference graph Q = { V. E}, where V is the vertex set and 
E is the edge set, i.e., V = {1,.. . ,7V} and E = {(*, j)\i £ 
AT,j £ TV, dj,j < d 0 }. Denote the neighboring user set of user 
n as J n , i.e., J n = {j £ Af : d nj < d 0 }. 

The transmission rate of each channel is always time- 
varying due to fading. To capture such fluctuations, the finite 
rate channel model 03 is appliecQ. With the help of adaptive 
modulating and coding (AMC), each channel can support a 
certain set of discrete transmission rate under different channel 
conditions. Specifically, the rate set of channel to is denoted 
as S m = {s m i,Sm 2 , ■■■, Smx}, where s m k indicates that the 
channel can support certain transmission rate (packets/slot). 
Without loss of generality, we assume s m \ < s m 2 < ... < 
SmK- The corresponding rate-state probabilities of channel 
to are given by II m = {7r ml ,...,7 T m x} and the expected 
transmission rate is given by s m = Ylk^mkSmk- Note that 
the instantaneous transmission rate of two channels in each 
slot may be the same or different. Their expected values, 
however, are assumed to be the sam^E which implies that 
we can denote the expected transmission rate of the channels 
as s = s m ,Vrn £ M. 


the carrier-sense-multiple-access strategy. For presentation, we 
assume that the overhead for resolving the contentions among 
the users is negligible^. Denote a n (k) as the chosen channel of 
SBS n at slot k, then the instantaneous achievable transmission 
rate of user n is determined by: 


r n {k) 


w -p- 1 + y, '<-■-) 

jGfTn 

0, w.p. 1- \ -• 

1 + 2_j t(a n ,aj) 


(1) 


where s an (k) is the instantaneous transmission rate of channel 
a n in time k, J n is the neighboring user set of n, and 
I(a n ,a n ') is the following indicator function: 


I{a n ,a k ) 


1? On — o k 

0 , On 7 ^ Ok 


( 2 ) 


Note that the instantaneous transmission rate r n (k ) is dynamic 
and random, which is jointly determined by the current chan¬ 
nel state and the channel selection profiles of the neighbors. 
Based on (|T|, the expected achievable transmission rate of user 
n is given by: 

R n =E[r n (k)} = ^^-, (3) 

-L 1 C n 

where E[-] is the operation of taking expectation and 


Cn(ai,...,a N ) = ^2 I(on,a k )- (4) 

kej„ 

Therefore, the system-centric optimization goal is to find 
the optimal channel selection profile such that the aggregate 
expected throughput is maximized, i.e.. 


PI : max R n ( 5 ) 

new 

It is known that information is key to decision-making 
problems [18), and the information constraints arising in the 
formulated optimization problem PI are as follows: 

• Dynamic: due to the dynamic features of wireless trans¬ 
missions, the instantaneous channel rates in each slot are 
not deterministic and time-varying. 

• Incomplete: in the absence of centralized control, a user 
does not know the action profiles of others; furthermore, 
due to the fact that each user is equipped with a single 
radio, the users only have information about the accessed 
channel and do not know the states of other channels. In 
addition, the users are unaware of the rate-state probabil¬ 
ities of the channels. 


B. Problem formulation 

The task of each SBS is to choose an appropriate channel to 
access. When more than one SBS chooses the same channel, 
only one SBS can successfully access the channel since they 
can hear the transmission of others 0, which is similar to 

1 It should be pointed out that the finite rate model is for the convince of 
presentation and simulation. When other time-varying channel models applied, 
the methodology and theoretical results obtained in this letter still hold. 

-'Illis assumption holds in most wireless networks. The channels in a 
wireless network are generally in closed spectrum band, and hence have the 
same transmission characteristic. Thus, their expected transmission rates are 
the same. 


III. Robust Graphical Game and Distributed 
Fearning for Distributed Spectrum Access 

Since there is no centralized controller and the SBS make 
their decisions distributively and autonomously, we formu¬ 
late the problem of distributed spectrum access as a non- 
cooperative game. In the following, we present the formulated 
game model, analyze its properties, and propose a stochastic 
distributed learning algorithm to converge to stable solutions 
in time-varying spectrum environment. 

' ll is emphasized that the analysis and results obtained in this letter can be 
easily extended to other practical systems by multiplying a modified factor in 

jT} and 0- 
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A. Game model 

Formally, the distributed spectrum access game model is 
denoted as T = [Af, Q, {A n }nzM, {u n }n^M\- where Af = 
{1,..., TV} is a set of players (SBSs), Q is the potential 
interference graph among the users, A n = {1,..., M} is a set 
of the available actions (channels) for each player n, and u n is 
the utility function of player n. In the considered problem, a 
player is directly affected by its neighbors, which means that 
the utility function can be expressed as u n {a n , aj n ), where a n 
is the chosen action of player n and a j n is the action profiles 
of the neighbouring users of n. Furthermore, it is noted from 
© that the available transmission rates of the users are random 
in a slot, and we define the utility function as the expected 
achievable transmission rate, i.e.. 


u n {a n ,a Jn ) 


E[s a „] 

1 + C n 


( 6 ) 


It is seen that the utility function is debited over expectation, 
and the interactions among the users are limited to neighboring 
users. This is the exact reason why we call it robust graphical 
game. Each player in a non-cooperative game intends to 
maximize its individual utility, i.e.. 


(J-): 


max u n (a n , aj n ),\/n £ Af. 

a n eA n 


(7) 


B. Analysis of Nash equilibrium (NE) 

In this subsection, we present some dehnitions in game 
community and then investigate some important properties of 
the formulated robust graphical game. 

Definition 1 (Nash equilibrium lfl9]B . A channel selection 
profile a* = (a*,..., a^) is a pure strategy NE if and only if 
no player can improve its utility by deviating unilaterally, i.e., 

Un{p. n , Cl 'j ) U n (^CL n , 0 1 j' n ); Vtl 6 Af, Va n € An ; Un u n 

( 8 ) 

Definition 2 (Ordinal potential game lfl9l ). A game is 
an ordinal potential game (OPG) if there exists an ordinal 
potential function cj) : A\ x ■ • ■ x Ajv —> R such that for all 
n £ Af, all a n £ A n , and a' n £ A n , the following holds: 

u n (a n , aj - n ) - u n (a' n , a Jn ) >0 

O f(a n , ajJ - cj)(a' n , a Jn ) > 0 

That is, the change in the utility function caused by the 
unilateral action change of an arbitrary each user has the same 
trend with that in the ordinal potential function. It is known 
that OPG admits the following two promising features: (i) 
every OPG has at least one pure strategy Nash equilibrium, 
and (ii) an action profile that maximizes the ordinal potential 
function is also a Nash equilibrium. 

The properties of the proposed robust graphical game are 
characterized by the following theorems. 

Theorem 1. The robust graphical game J- is an OPG and 
hence has at least one pure strategy. 

Proof: To prove this theorem, we first construct an ordinal 
potential function as follows: 

$(a n ,a- n ) = - ^2 c n (ai,... ,a N ), (10) 

neN 


where c n is characterized by ©. 

For presentation, denote I n {a n , aj n ) as the set of neighbor¬ 
ing users choosing the same channel with player n, i.e., 

-4(Un> aj n ) = {k £ ffn : Ufc = Un} , (11) 

where J n is the neighbor set of player n. Then, we have 

Cn = 7(u n ,Ufc) = |X„(u„, Uj n )|, (12) 

fceJn 

where J(u n , Ufc) is the indicator function characterized by ©, 
and |A| is the cardinality of set A, i.e., the number of elements 
in | A\. If an arbitrary player n unilaterally changes its channel 
selection from a n to u*, then the set of neighboring users 
choosing the same channel with player n after the unilateral 
changing is Infa^^aj^). Therefore, the change in individual 
utility function caused by this unilateral change is as follows: 

ttn(u n , UjJ U n (a n: UjJ 

E[s„«] _ E[s a J (13) 

l+|/n(a* ,Qj„)| l+\In.(a n ,aj Tl )\ 

Also, the change in the potential function caused by the 
unilateral change of player n is as follows: 

$(u* , a- n ) - <f>(a n , a-n) 

= |7n(u n ,Uj n )| \I n (.Ct. n: ClJ n )\ 

+ E [|4(afc,ajj| - |4(u fc ,u}J|] 

kel n {an,aj n ) (24) 

+ E [|4(a fc ,aj fc )| - |4(u fc ,u}J|] 

keI n (a*,a Jn ) 

+ E [|4(afe,ajJ| - |4K,ajJ|], 

k£lC,k^n 

where If : (ak,a* Jk ) is the set of neighboring users choosing 
the same channel with player n after unilaterally changing 
the selection of player n. The set K. = Af\{I n (a n , aj n ) U 
4(u4ajJ}, where A\B means that B is excluded from A , 
denote the set of users not interfering with player n when it 
chooses u n and u* . Due to the local interactions among the 
users, the following equations hold: 

|4(ufc?Uj fc )| |4(ufc?u j k )| — l,V/c £ / n (un?uj n ) (15) 

|4(u fc ,UjJ| - |4(ufc,ujJ| = 1 ,\/k£ 4(u*,ujJ (16) 

|4(ufc, uj fc )| - |4(afc,a}J| = 0,Vfc e /C, k ^ n (17) 

Based on (IT4l) - (flTb . we have 

<F(u*,a_n) - <F(un,a_„) = 2(|/ n (u„,ujJ| - |4(a*,aj„)|) 

(18) 

Considering the fact that the expected transmission rates of 
the channels are the same, equations (Il3l) and (IT8l) yield to the 
following inequality: 

(M„(u*,aj n ) -Mn(un,ajj)($(u*,a_„) - <F(u„,u_„) 

_ 2g(|J n (a n ,aj n )|-|J n (a*,aj n )|) > 

(l+|/n(a* ,aj„)|Vl+|/ n (a n ,aj n )) — ’ 

(19) 

which satisfies the definition of OPG, as characterized by ©. 
Thus, the formulated robust graphical game T is an OPG, 
which has at least one pure strategy Nash equilibrium. ■ 






4 


Theorem 2. For any network topology, the aggregate achiev¬ 
able transmission rate of all the users at any NE point is 
bounded by U(a NE ) > EaeAf M+\j n \ - 

Proof: For any pure strategy NE one = (aj,..., a* N ), ac¬ 
cording to the definition given in ©, the following inequality 
holds for each user n, Vn G 3 V: 


tin i ^eJn ) — ttn (o n , ) , Vo n G .4^, O n 7 ^ O n , (20) 

Summing the two-sides of (l20l > yields the following: 

|“ 4 n| x u n (a n , aj n ) f y ' tx n (o n ,o^), (21) 

where |„ 4 „| denotes the number of channels in the system, i.e., 
\A n \ = M■ Then, equation Oil can be re-written as follows: 


i( a *n, a J „) > 


E 


a n eA, 




M 


( 22 ) 


It is seen that Ea„e,4 u n{cb n ,a*j n ) represents the aggre¬ 
gated achievable transmission rates of player n as if it would 
access all the channels simultaneously while the neighboring 
users still only transmit on one channel. As a result, it can be 
calculated as follows: 


y ' u n (a n ,a Jn ) — y ' 

a n GA n a-nGA-, 


S 

1 -|- c(fl n , Q j t j n ) 


(23) 


Also, the following equation holds for any network topol¬ 
ogy: 

y ' (l + c n (a n , (AjA) = M -(- \3n\- (24) 

a n eA„ 


where \J n \ is the number of neighboring users of user n. 
From (1231 and ( 1241 . it follows that 


y ] tl n (tin i d J' n ) — 

CL n (z:An 


sM 2 

M + \j n y 


(25) 


where we use the following basic inequality: for Mx n > 0, 
y 2 n=i 7- > -7- if J2n=i = a - Thus, equation (l 22 l can be 
re-written as: 


u n(a* n ,a* Jn ) > 

sM 

( 26 ) 

M + \J n \‘ 

Finally, it follows that: 



U(cine) = ^ u n (a* n ,a* Jn 

n£j\f 

v ^ \ "s sM 

“ hr M +W 

neAl 

( 27 ) 

which proves Theorem ID 


■ 


Theorem | 2 ] characterizes the lower throughput bound of the 
formulated game. Some further discussions are given below: 

• If there is only one channel available in the system, i.e., 
M = 1, we have U(cine) = EagA3 i+\j n \ • this case - 
all users interfere with their neighboring users. 

• If the number of channels goes sufficiently large, i.e., 
M -A oo, we have U(clne ) —>• sN. In this case, each 
Nash equilibrium is a channel selection profile in which 
all the users choosing different channels. 

Remark 1. According to Theorem Q] and the properties of 
ordinal potential game, it is known that each Nash equilibrium 
of the game is also a maximizer of the potential function 



□ Choose channel a n {k) according to 
the current mixed strategy q„(£) 


The successful SBS transmits date 


H Contend for the chosen channel 

□ Update the mixed strategy q n (k + 1) 
based on the random payoff r „(k) 


Fig. 1. Schematic of the stochastic-leaming-automata-based spectrum access 
algorithm for cognitive small cell networks. 


&(a n , a-n). Furthermore, c„(ai,..., ajv) can be regarded 
as the experienced MAC-layer interference level of user n 
and <h«_„) is the aggregate MAC-layer interference of 
all the users ns. More importantly, it has been shown in 
ED that the MAC-layer interference minimization achieves 
satisfactory performance in network throughput. Thus, the 
Nash equilibria of the formulated robust graphical game are 
expected to achieve high throughput. 


C. Stochastic-learning-automata based distributed spectrum 
access algorithm 

There are large number of learning algorithms for converg¬ 
ing towards stable solutions of ordinal potential games, e.g., 
best response dynamic ED and spatial adaptive play ED. 
Although these algorithms are implemented distributively, they 
can not be applied into the considered network due to the fol¬ 
lowing two strict constraints: (i) it needs to know information 
of other players including the chosen actions and/or received 
payoffs in each iteration, and (ii) they are only suitable for 
static environment. In the following, we propose a stochastic 
learning algorithm which can converges to desirable solutions 
in the dynamic wireless environment. 

The proposed learning is based the stochastic learning 
automata ED. Denote q„(fc) = {q n i(k),.. ■, q n M(k)} as 
the mixed strategy of player n in the fcth slot. q nm is the 
probability of choosing channel m. The stochastic learning 
algorithm is described as follows: i) at the beginning of each 
slot, the users choose the channels according to their mixed 
strategies, ii) all the users access the chosen channels, and 
iii) at the end of slot, the users update their mixed strategies 
based on the received random transmission rate. Specifically, 
the schematic of the proposed learning algorithm is described 
in Fig. Q] and the learning procedure is as follows: 


Initialization: set k = 0 and set the initial mixed strategy of 
each user as q nm (k) = ^,Vn£ /V, Vm G M. 

Loop for k = 0 , 1 , 2 ,..., 

a) . Channel selection: at the beginning of the fcth slot, 
player n randomly selects a channel a n (k) according to its 
current mixed strategy q n (fc). 

b) . Channel access: all the users access the channels 
with the channel selection profile {ai(fc),--- ,aAr(fc)}, and 
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they receive the instantaneous transmission rate, which is 
determined by 0. 

c). Updating mixed strategy: all the users update their 
mixed strategies according to the following rules: 

qnmik + 1 ) = q n m(k) + a n f n (k)( 1 - q n m(k)),m = a n (k) 
qnm{k T 1) = qnm{k ) Ul d n (kf 

(28) 

where 0 < a n < 1 is the learning step size of user n. In 
addition, r n (k) is the normalized received payoff defined as 
follows: 

~ rn ( k > 

r n {k) = -, (29) 

SmK 

where s m K is the maximum transmission rate of the channels. 

End loop 


The asymptotical convergence performance of the proposed 
learning algorithm is characterized by the following Theorem. 

Theorem 3. When the learning step size goes sufficiently 
small, i.e., a n —> 0, the proposed stochastic learning algorithm 
asymptotically converges to a pure strategy NE point of the 
formulated robust graphical game T. 

Proof: In ED, it has been rigorously proved that the 
stochastic learning automata converges to pure strategy NE of 
any exact potential game. Ordinal potential games have some 
common properties with exact potential games, and the key 
difference is as follows: in exact potential games, the change 
in the utility function cased by the unilateral action change of 
an arbitrary user is the same with that in the potential function, 
i.e., 

Qj—n) 0,J n ) = , ^, 7 ,. ) 

(30) 

The following relationship is key for the convergence proof 
for exact potential games in ED (See equation (C.40) therein): 

(u n (a n ,aj- n ) - u n {a' n , a Jn j) (f(a n , a Jn ) - <j>(a' n , aj n )) > 

(31) 

Note that ordinal potential games also admit the above re¬ 
lationship (See equation (fl9l i). Thus, similar lines for the 
proof given in [21 j (Theorem 5) can be applied to prove this 
theorem. Due to the limited space and to avoid unnecessary 
repetition, the detailed proof is omitted. ■ 

It is noted that the proposed distributed spectrum access 
algorithm is fully completely as it only needs individual 
action-payoff information. Furthermore, the update rule, as 
characterized by (l28l) . is simple to implement in practice. 

IV. Simulation Results and Discussion 
A. Scenario setup 

The finite rate channel model is applied to characterize 
the time-varying spectrum environment. Specifically, using the 
technology of adaptive modulation and coding, the channel 
transmission rate is divided into several finite states according 
to the received instantaneous signal-to-noise-ratio (SNR). We 
consider the HIPERLAN/2 standard [22], in which the channel 
rate set is {0,1,2,3, 6}. Note that the rate is defined as the 
number of transmitted packets in a slot. We consider Rayleigh 


900 
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Fig. 2. A network consisting of fifteen cognitive small cells. 



Iteration time index (k) 


Fig. 3. The evolution of channel selection probabilities of three arbitrary 
chosen SBSs (7 = 5 dB, M = 3, N = 15). 

fading in the simulation study. Using the method proposed in 
03, the state probabilities can be obtained for a given average 
SNR (7) and a certain packet error rate ( p e ). Taking 7 = 6 
dB and p e = 1 0“ 3 as an example, the state probabilities are 
given by 7T = {0.2791 0.2117 0.2514 0.2566 0.0013}. The 
channel state is randomly changing from slot to slot. 

In all the simulation study, the interference range between 
different SBSs is set to do = 300m and the number of channel 
is set to M = 3. The step size of the learning algorithm is set 
to b = 0.25. The simulation results are obtained by taking the 
expected value of 1000 independent trials. 

For comparison, we evaluate the throughput performance of 
the proposed stochastic learning algorithm, the spatial adaptive 
play with neighboring cooperation (SAP-NC) lfT3ll and the 
simultaneous log-linear learning algorithm (S-logit) l23l . The 
SAP-NC algorithm needs local information exchange among 
neighboring users while the S-logit algorithm only needs the 
individual action-payoff information. Note they are only suit¬ 
able for static environment. In order to apply these algorithms, 
it is assumed that there is an omnipotent genie, which knows 
the channel statistics perfectly and regards the time-varying 
channels as static channels with fixed transmission rates (the 
average transmission rates). Sine both SAP-NC and S-logit 
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Fig. 4. The throughput performance comparison when varying the average 
received SNR (M = 3, N = 15). 



Fig. 5. The throughput performance comparison for different network scale 
(7 = 5 dB and M = 3). 


algorithms asymptotically converge to an action profile that 
maximizes the potential function of (ordinal) potential games, 
it is expected that their performance would be very close to the 
optimal solution |fT3l , (23). However, it should be emphasized 
that both SAP-NC and S-logit algorithms actually can not 
be applied in the considered time-varying environment. For 
presentation, they are called Genie-aided SAP-NC and Genie- 
aided S-logit respectively. 

To begin with, we consider a network consisting of fifteen 
users, as shown in Fig. [2] To study the convergence behav¬ 
ior, the evolution of channel selection probabilities of three 
arbitrarily chosen users are shown in Fig. 0 It is noted that 
it converges to a pure strategy in about 600 iterations. This 
validates the convergence of the proposed learning algorithm 
in time-varying environment. 

The throughput performance comparison when varying the 
average received SNR is shown in Fig. [4] It is noted from 
the figure that the performance of the proposed learning 
algorithm is very close to those of the Genie-aided SAP- 
NC and outperforms the Genie-aided S-logit algorithms. In 
addition, the throughput performance comparison for different 
network scale is shown in Fig. [3 It is noted from the figure that 


the proposed learning algorithm achieves the same throughput 
with the Genie-aided SAP-NC algorithm and outperforms the 
Genie-aided S-logit algorithm. 

To summarize, the simulation results show that the pro¬ 
posed learning algorithm achieves almost the same throughput 
performance when compared with existing algorithms. Con¬ 
sidering that both Genie-aided SAP-NC and Genie-aided S- 
logit algorithms are for static environment, we claim that the 
proposed game-theoretic distributed learning in time-varying 
spectrum environment is desirable for small cell networks. 

V. Conclusion 

We investigated the problem of distributed spectrum access 
for cognitive small cell networks. Compared with most exist¬ 
ing work, two inherent features are considered: i) the trans¬ 
mission of a cognitive small cell base station only interferes its 
neighbors due to the low power, i.e., the interference is local, 
and ii) the channel state is time-varying due to fading. We 
formulated the problem as a robust graphical game, and proved 
that it is an ordinal potential game which has at least one pure 
strategy Nash equilibrium (NE). Also, the lower throughput 
bound of NE solutions is analytically obtained. We proposed 
a distribute spectrum access algorithm to converge to some 
stable results. Simulation results validate the effectiveness of 
the proposed game-theoretic distributed learning solution in 
time-varying spectrum environment. 
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